Potential-Based Reward Shaping for POMDPs (Extended Abstract)
نویسندگان
چکیده
We address the problem of suboptimal behavior caused by short horizons during online POMDP planning. Our solution extends potential-based reward shaping from the related field of reinforcement learning to online POMDP planning in order to improve planning without increasing the planning horizon. In our extension, information about the quality of belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards, and thus achieve greater cumulative rewards.
منابع مشابه
Abstract MDP Reward Shaping for Multi-Agent Reinforcement Learning
MDP Reward Shaping for Multi-Agent Reinforcement Learning Kyriakos Efthymiadis, Sam Devlin and Daniel Kudenko Department of Computer Science, The University of York, UK Abstract. Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be give...
متن کاملA comparison of plan-based and abstract MDP reward shaping
Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting away from tabula-rasa approaches many different reward shaping methods have been developed. In this paper we compare two different methods for reward shaping; plan-based, in which an agent is provided with a plan and extra rewards are given according to the steps of ...
متن کاملImproved Planning for Infinite-Horizon Interactive POMDPs using Probabilistic Inference (Extended Abstract)
We provide the first formalization of self-interested multiagent planning using expectation-maximization (EM). Our formalization in the context of infinite-horizon and finitely-nested interactivePOMDP (I-POMDP) is distinct from EM formulations for POMDPs and other multiagent planning frameworks. Specific to I-POMDPs, we exploit the graphical model structure and present a new approach based on b...
متن کاملRefining Diagnostic POMDPs with User Feedback
Bayesian networks have been widely used for diagnostics. These models can be extended to POMDPs to select the best action. This allows modeling partial observability due to causes and the utility of executing various tests. We describe the problem of refining diagnostic POMDPs when user feedback is available. We propose utilizing user feedback to pose constraints on the model, i.e., the transit...
متن کاملPotential-based difference rewards for multiagent reinforcement learning
Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent’s contribution to the system’s performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-speci...
متن کامل